home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Collection of Tools & Utilities
/
Collection of Tools and Utilities.iso
/
batchut
/
bed11.zip
/
BED.DOC
< prev
next >
Wrap
Text File
|
1985-12-03
|
26KB
|
555 lines
BED - Batch EDitor VERSION 1.1
A data reformatting utility
(c) 1985 by Ken Goosens
Notice: this program is distributed free. You are free to use and
distribute it provided:
(1) no fee or other consideration is charged.
(2) the program is distributed only in unmodified form.
(3) you accept all responsibility for using this program. The
author does not provide any guarantee that this program works
properly and assumes no liability for use of it.
This program is supported by its author. Please send any comments or
enhancements to:
Ken Goosens
5020 Portsmouth Road
Fairfax, VA 22032
Or call Ken's bulletin board system at 202-537-7407 to leave a message
or download the latest version.
A complete set of files consists of
BED.DOC - this documentation
BED.EXE - compiled, executable code
BED.BAS - main program
BEDLIB.BAS - auxiliary BASIC routines used by BED.BAS, separately
compiled
BED.LIB - compiled assembler routines used by BED
REPORT.DAT - a sample report. We want to stip the report down to
pure data, omit commas from numbers, and rearrange
dates to year-month-day format with no separator
between the date elements.
REPORT.SPC - a sample editing specification that works on
REPORT.DAT
REPORT.BAD - A list of phrases whose presence should exclude a
line from the output. Needed by REPORT.SPC.
Acknowledgement - this program uses some assembler routines
distributed by Tom Hanlin in ADVBAS.LIB. ADVBAS is an excellent
shareware product that includes many useful assembler routines.
7 December 1985
* * * * * * * * * * * * * * * * *
TABLE OF CONTENTS
What is BED?
What Advantages does BED have over other Editors?
How Can BED be Used?
Exactly What Edits can BED do?
How to Invoke BED
Explanation of Each Editing Specification
The Order of Edits
How to Recompile BED
* * * * * * * * * * * * * * * * *
What is BED?
BED is a batch editor. EDITOR - because it modifies lines of text in
a file. BATCH - because it runs batch rather than interactively. The
only thing you specify interactively is what edits you want done.
What Advantages does BED have over other Editors?
o Runs batch.
BED is designed to run entirely unattended after you set up a
configuration file telling BED what to do. This makes it ideal for
production work where files are edited repeatedly the same way.
o Speed.
BED runs very fast. "Interactive" editors often include a macro
facility whereby they can run batch also. But, for each edit, they
usually page in part of the file that can be held in RAM, and do the
edit. Each edit makes a full pass through the data file. BED makes
exactly one pass through the data file, no matter how many edits are
done. BED is also much more efficient at doing global search and
replaces than most editors.
o Does complex reformatting simply.
BED is designed to make some complex edits very easy, such as
reformating date fields (e.g. removing separators and rearranging
fields, such as MM/DD/YY to YYMMDD) and changing formatted numbers
(dollar signs, parentheses around negative numbers, commas separating
thousands) to a "pure" data format (e.g. "($89,655.21)" to "-
89655.21"). BED also will preserve the original field length when
doing such edits by inserting filler blanks. These edits are either
very difficult or impossible using other editors.
o No theoretical or practical limit on file size.
Most editors either will not load large files or take forever to load
them before editing can begin, so that the time it takes to edit files
grows geometrically with file size. BED works just as efficiently on
large files as on small.
o Maximum line length of 32,676.
Most editors have a maximum line size of 256 characters or less. BED
will edit individual lines up to 32,676.
o Supports criteria for excluding entire lines.
Few editors allow you to select what lines to delete from a file. BED
allows entire lines to be omitted based on length or the presence of
keywords. Line exclusion criteria are applied before any other edits
are done.
o BED is freeform.
This means that the text you edit does not have to be in any special
location or column. BED will find the text to be edited no matter
where it occurs in the file or line. Data base editors that allow you
to reformat data always require that the data be broken into fields
which are either fixed in length and/or order. BED does not impose a
structure on the data. You do not have divide your file into fields
and records, nor do you tell BED where in the file the data is
located.
o Source code is provided.
You can fix bugs in BED or enhance it to do new tasks. Commercial
editors never give you the source code.
How Can BED be Used?
BED's primary use is to
o prepare data for reading into other programs.
BED is basically a data preparation utility for "cleaning",
"scrubbing", or reformatting data.
Often data needs to be changed before it can be loaded into a data
base management system for analysis and reporting. A typical problem
is that data contains characters that make it easier for the human eye
to read, but which a data base management system can not accept,
including
o "non-numeric" characters in numeric fields, such as a dollar
sign, comma, or parentheses around negative numbers
o dates with separators between the month, day, and year, and the
fields in the wrong order.
It is always good data processing policy to never store data with
formatting in it, but only to add formatting in reports.
Unfortunately, this policy is sometimes violated, and sometimes the
only dump available of data is a report (spreadsheets are among the
worst offenders). The presence of "formatting" characters in data
fields usually means that other programs cannot read the data.
Reports also typically include page headers, titles, page numbers,
blank lines, printer pagination commands (form feeds), as well as
blank lines, which need to be stripped out to leave pure data. BED is
expressly designed to eliminate such formatting. For example, data
managers usually store dates in YYMMDD format so that they will sort
properly, and they will usually batch load dates only in the format.
Yet the most common format for dates in the United States is MM-DD-YY.
BED allows you to strip off the dashes and rearrange the elements of
the date.
Exactly What Edits can BED do?
o Exclude entire lines
-that are shorter than a minimum length
-that are longer than a maximum length
-that contain any of a specified list of strings
Typical use: you must load data that came from a report and which has
blank lines in it and page headers. You must strip out these lines to
leave the pure data. You tell BED to exclude empty lines (shorter
than 1) and lines with "PAGE" in it. Form feeds, used by most
printers to cause a page eject, are automatically stripped out when
using the delete short line option.
o Global search and replace
-convert lette